SiSoftware Sandra Q & A - Cache & Memory Benchmark

SiSoftware Sandra - The Diagnostic Tool, Q & A - Cache & Memory Benchmark

This document provides some frequently asked questions about Sandra. Please read the Help File as well!

This module uses the technology of the well-known Memory Benchmark module. For more information about it see the respective module. This topic deals exclusively with the differences between these two modules.

Q: Why does it take so long to run the test?
A: In order to support SMP, SMT (Hyper-Threading), etc. the framework is quite complex and thus has significant overhead. In order to get a true index, the tests need to be run many times and a index computed based on the distribution of results. This results in a stable index. Generally this benchmark should take 5 to 10 times as long as the Memory Benchmark.

Q: Why is the memory index (i.e. using large blocks > L2/L3 cache) lower than Memory Benchmark index?
A: The index is lower as streaming/buffering/block pre-fetch is not used to increase performance. The test is the same regardless of block size; different techniques should be applied when using the data caches and when using the memory.

The memory index should correspond to the legacy ALU/FPU tests in the Memory Benchmark. On modern systems you must disable EMMX/SSE/SSE2 instructions to fall back to these tests.

Q: Why doesn't this module use streaming/buffering/block pre-fetch?
A: These techniques are very useful when streaming large amounts of data, not when small blocks are involved as with this test.

Q: Why don't I get higher scores with HyperThreading/SMT enabled?
A: SMT does NOT help in memory transfers. The bandwidth available to each CPU is the same, thus using all cores would increase overhead resulting in lower scores. We're looking into using SMT for prefetching into future versions of the benchmark.

Q: Why is there no MMX test?
A: Both MMX & FPU work on 64-bits of data. Unless streaming instructions are used, there is no compelling reason to use MMX instead of FPU. Moreover, all the tests (like the memory benchmark) use 64-bit floats while MMX supports 32-bit integers only.

Q: Why does P4 get such a boost from SSE(2) while the PIII does not get any?
A: Large transfer sizes (128-bit) work better on the NetBurst architecture than smaller (32/64-bit). The PIII reaches its limit with normal 64-bit transfers. You can also see this as P4 needing SSE(2) to reach its full potential and not legacy code.